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Abstract 

The forgetting of the initial distribution for discrete Hidden Markov Models (HMM) 
is addressed: a new set of conditions is proposed, to establish the forgetting property 
of the filter, at a polynomial and geometric rate. Both a pathwise-type convergence 
of the total variation distance of the filter started from two different initial distribu- 
tions, and a convergence in expectation are considered. The results are illustrated 
using different HMM of interest: the dynamic tobit model, the non- linear state space 
model and the stochastic volatility model. 
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1 Definition and notations 

A Hidden Markov Model (HMM) is a doubly stochastic process with an underlying Markov 
chain that is not directly observable. More specifically, let X and Y be two spaces equipped 
with a countably generated cx-fields X and y-, denote by Q and G respectively, a Markov 
transition kernel on (X,X) and a transition kernel from (X,X) to (Y, y). Consider the 
Markov transition kernel defined for any (x, y) E X x Y and C E X (g) y by 

T[(x,y),C] ^Q®G[(x,y),C\ = J J Q(x, dx') G(x' , dy')l c (x' ,y') . (1) 

We consider {X k , Ffc}fc>o the Markov chain with transition kernel T and initial distribution 
v ® G{C) = f // v(dx)G(x, dy)lc(x, y), where v is a probability measure on (X,X). We 
assume that the chain {Xk}k>o is not observable (hence the name hidden). The model 
is said to be partially dominated if there exists a measure \i on (Y, y) such that for all 
x E X, G(x, •) is absolutely continuous with respect to \i: in such case, the joint transition 
kernel T can be written as 

T[(x,y),C] = JJ Q(x,dx')g(x',y')l c (x',y'Mdy') , C E X ®y , (2) 

where g(x, •) = denotes the Radon-Nikodym derivative of G(x, •) with respect to 

ji. To follow the usage in the filtering literature, g(x, •) is referred to as the likelihood 

of the observation. An example of such type of dependence is Xk+i = a(Xk,(k+i) and 

Y k = b(X k ,Ek), where {(k}k>o an d {^fe}fe>o are i-i-d. sequences of random variables, and 

{Cfc}fc>0; {^A:}a:>o an d -^o are independent. The most elementary example is the so-called 

linear Gaussian state space model (LGSSM) where a and b are linear and {(k, £k}k>o are 
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i.i.d. standard Gaussian. We denote by (p u ,n[yo:n\ the distribution of the hidden state X, 
conditionally on the observations yo :n = f [yo, . . . , y n ], which is given by 



where Qf(x) = Q(x, f) = J Q(x, dx')f(x'), for any function / G B + (X) the set of non- 
negative functions / : X — > R, such that / is X/B{R) measurable, with £>(R) the Borel 
a-algebra. 

In practice the model is rarely known exactly and so suboptimal filters are constructed 
by replacing the unknown transition kernel, likelihood function and initial distribution by 
suitable approximations. 

The choice of these quantities plays a key role both when studying the convergence of 
sequential Monte Carlo methods or when analysing the asymptotic behaviour of the max- 
imum likelihood estimator (see e.g. (8) or (5) and the references therein). 

The simplest problem assumes that the transitions are known, so that the only error in 
the filter is due to a wrong initial condition. A typical question is to ask whether <f> Uin [yo :n ] 
and 4>u',n[yo:n] are close (in some sense) for large values of n, and two different choices of 
the initial distribution v and v' . 

The forgetting property of the initial condition of the optimal filter in nonlinear state 
space models has attracted many research efforts and it would be a formidable task to 
give credit to every contributors. The purpose of the short presentation of the existing 
results below is mainly to allow comparison of assumptions and results presented in this 
contributions with respect to those previously reported in the literature. The first result 
in this direction has been obtained by (21), who established L p -type convergence of the 



<Pu,n[yO:n](A) 



dcf v[g(-,yo)Qg(-,yi)Q ■ ■ - Qg(-,yn)iA] 
v [g(-, yo)Qg(; yi)Q ■ ■ ■ Qg(; y n )\ 

_ Jx"+i v{dx Q )g{x Q , yi) flLi dx i )g(x i) yi)t A (x n ) 
jx„+i i/(dx )g(x , yi) n?=i dxi)g(x i: yi) 



(3) 
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optimal filter initialised with the wrong initial condition to the filter initialised with the 
true initial distribution (assuming that the transition kernels are known); however, their 
proof does not provide a rate of convergence. A new approach based on the Hilbert 
projective metric has later been introduced in (2) to obtain the exponential stability of 
the optimal filter with respect to its initial condition. However their results were based 
on stringent mixing conditions for the transition kernels; these conditions state that there 
exist positive constants e_ and e + and a probability measure A on (X, X) such that for 
/GB+(X), 

e_A(/) < Q(x, f) < e+A(/) , for any iGX. (4) 

This condition in particular implies that the chain is uniformly geometrically ergodic. 
Similar results were obtained independently by (9) using the Dobrushin ergodicity coeffi- 
cient (see (11) for further refinements under this assumption). The mixing condition has 
later been weakened by (6), under the assumption that the kernel Q is positive recurrent 
and is dominated by some reference measure A: 

sup q(x,x') < oo and / essinfg(x, x')ir(x)\(dx) > , 

(x,x')£XxX J 

where q(x, •) = , essinf is the essential infimum with respect to A and ixd\ is the 

stationary distribution of the chain Q . If the upper bound is reasonable, the lower bound 
is restrictive in many applications and fails to be satisfied e.g. for the linear state space 
Gaussian model. 

In (18), the stability of the optimal filter is studied for a class of kernels referred to as 
pseudo-mixing. The definition of pseudo-mixing kernel is adapted to the case where the 
state space is X = R d , equipped with the Borel sigma-field X. A kernel Q on (X, X) 
is pseudo-mixing if for any compact set C with a diameter d large enough, there exist 
positive constants e_(cf) > and e + (d) > and a measure Ac (which may be chosen to 
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be finite without loss of generality) such that 



€_(d)\ c (A) < Q(x, A) < e + (d)X c (A) , for any x E C, A E X 



(5) 



This condition implies that for any (x', x") E C x C, 



e_(d) 



< essinf :rg x(3 , (a: / , x)/q(x", x) < esssup a . eX g(x', x)/q(x", x) < 



e + {d) 



e+(d) 



e_(d) ' 



where •) = dQ(x, -)/d\c, and esssup and essinf denote the essential supremum and 
infimum with respect to Ac- This condition is obviously more general than (4), but still 
it is not satisfied in the linear Gaussian case (see (18, Example 4.3)). 

Several attempts have been made to establish the stability conditions under the so-called 
small noise condition. The first result in this direction has been obtained by (2) (in contin- 
uous time) who considered an ergodic diffusion process with constant diffusion coefficient 
and linear observations: when the variance of the observation noise is sufficiently small, 
(2) established that the filter is exponentially stable. Small noise conditions also appeared 
(in a discrete time setting) in (4) and (22). These results do not allow to consider the 
linear Gaussian state space model with arbitrary noise variance. 

A very significant step has been achieved by (16), who considered the filtering problem 
of Markov chain {X k } k > with values in X = R d filtered from observations {Y k } k > in 



Here {(( k , e k )} k > is a i.i.d. sequence of standard Gaussian random vectors in M d+£ , b(-) 
is a d-dimensional vector function, er(-) adx d-matrix function, h(-) is a ^-dimensional 
vector-function and (5 > 0. The author established, under appropriate conditions on b, h 
and a, that the optimal filter forgets the initial conditions; these conditions cover (with 
some restrictions) the linear Gaussian state space model. 



Y = R e , 



X k+1 = X k + b(X k ) + a(X k )( k , 



(6) 



Y k = h(X k )+Pe k . 
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In this contribution, we will propose a new set of conditions to establish the forgetting 
property of the filter, which are more general than those proposed in (16). In theorem 
1, a pathwise-type convergence of the total variation distance of the filter started from 
two different initial distributions is established, which is shown to hold almost surely 
w.r.t. the probability distribution of the observation process {Vfcjfe- Then, in Theorem 3, 
the convergence of the expectation of this total variation distance is shown, under more 
stringent conditions. The results are shown to hold under rather weak conditions on the 
observation process {Y k } k which do not necessarily entail that the observations are from 
an HMM. 

The paper is organised as followed. In section 2, we introduce the assumptions and state 
the main results. In section 3, we give sufficient conditions for Theorems 1 and 3 to hold, 
when {Yk}k is an HMM process, assuming that the transition kernel and the likelihood 
function might be different from those used in the definition of the filter. In section 4, we 
illustrate the use of our assumptions on several examples with unbounded state spaces. 
The proofs are given in sections 5 and 6. 

2 Assumptions and Main results 

We say that a set C G X satisfies the local Doeblin property (for short, C is a LD-set), if 
there exists a measure Ac and constants > and e£ > such that, Ac(C) > and for 
any A e X, 

e c \ c (An C) < Q(x,An C) < e£\ c {An C) , for all a; eC. (7) 

Locally Doeblin sets share some similarities with 1-small set in the theory of Markov 
chains over general state spaces (see (20, chapter 5)). Recall that a set C is 1-small if 
there exists a measure Ac and Zq > 0, such that Ac(C) > 0, and for all x G C and A e X, 
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Q(x, A fl C) > ecAc(^4 n C). In particular, a locally Doeblin set is 1-small with ec = 
and Ac = Ac- The main difference stems from the fact that we impose both a lower and 
an upper bound, and we impose that the minorizing and the majorizing measure are the 
same. 

Compared to the pseudo- mixing condition (5), the local Doeblin property involves the 
trace of the Markov kernel Q on C and thus happens to be much less restrictive. In 
particular, on the contrary to the pseudo-mixing condition, it can be easily checked that 
for the kernel associated to the linear Gaussian state space model, every bounded Borel 
set C is locally Doeblin. 

Let V be a positive function V : X — > [1, oo) and A G X be a set. Define: 

T A (y) ^ supg(x,y)QV(x)/V(x) . (8) 
Consider the following assumptions: 
(HI) For any (x,y) G X x Y, g{x,y) > 0. 

(H2) There exist a set K C Y and a function V : X — > [1, oo) such that for any rj > 0, one 
may choose a LD-set C G X satisfying 

T c -(y) < V T x (y) , forallyGK. 

Assumption (HI) can be relaxed, but this assumption simplifies the statements of the 
results and the proofs. The case where the likelihood may vanish will be considered in a 
companion paper. Assumption (H2) involves both the likelihood function and the drift 
function. It is satisfied for example if there exists a set K such that for all rj > 0, one can 
choose a LD-set C so that 

sup g(x, y) < rj sup g(x, y) , for all y G K, (9) 

xeC c ' xex 

in which case the previous assumption is satisfied with V = 1. When X = M. d , this situation 
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occurs for example when the compact sets are locally Doeblin and lim^i^oo sup^ gK g(x, y) = 
0. As a simple illustration, this last property is satisfied for Y k = h(X k ) + e k with 
lim^i^oo \h(x)\ = oo and {e k } k are i.i.d. random variables (independent of {X k } k ) with 
a density g which satisfies lim^i^oo g(x) = 0. More complex models satisfying (H2) are 
considered in Section 4. 

When (9) is not satisfied, assumption (H2) can still be fulfilled if for all t/6Y, sup xeX g(x, y) < 
oo, s\xp x QV/V < oo for some function V : X — > [l,oo), and for all rj > 0, there exists 
a LD-set C such that sup Cc QV/V < rj. As a simple illustration, this situation occurs for 
example with X k +i = <fiXk + <j(k where \(f>\ < 1, a > and {(k}k a family of iid standard 
Gaussian vectors. More details are provided in Section 4. 

For any LD-set D and v a probability measure on (X, X) define: 

s„,d(s/o,s/i) = v[g{-,y*)Qg{-,yi)M , (10) 
*D(y) d ^A D (^(.,y)i D ). (11) 

We denote by (fl,A) a measurable space, and we let {Y k }k>o be a stochastic process on 
(f2, A) which takes values in (Y, y) but which is not necessarily the observation of an 
EMM. For any probability measure v and any n G N, the filtering distribution 0^„[lo :n ] 
(defined in (3)) is a measure-valued random variable on (fl,A). 

Theorem 1 Assume (Hl-2) and let P* be a probability measure on (fl,A). Assume in 
addition that for some LD-set D and some constants M > and 7 G (0, 1), 

limiofn-^lK^) > (l + 7)/2, P* - a.s. (12) 

n 

limsupn -1 ^logTx(li) < M , P* - a.s. (13) 

n 

li ? minfn" 1 ^log^ D (r i ) > -M , - a.s. (14) 
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where Tx and \1>d are defined in (8) and (11), respectively. Then, for any initial distribu- 
tions v and v' on (X,X) such that v(V) + v'(V) < oo, vQto > and v'Qto > 0, there 
exists a positive constant c such that, 



Remark 2 We stress that it is not necessary to assume that {Y k }k>o is the observation 
of an HMM {X k ,Y k } k > Q . Conditions (13) and (14) can be verified for example under a 
variety of weak dependence conditions, the only requirement being basically to be able to 
prove a LLN (see for example (7)). This is of interest because in many applications, the 
HMM model is not correctly specified, but it is still of interest to establish the forgetting 
properties of the filtering distribution with respect to the initial distribution. 

We will now state a statement allowing to control the expectation of the total variation 
distance. 

Theorem 3 Assume (H2). Let D be a LD-set. Then, for any Mi > 0, i = 0,1,2, and 
7 G (0, 1), there exist (3 G (0, 1) such that, for any given initial distributions v and v' on 
(X, X) and all n, 



liniSUpn 1 log ||^, n [>0:n] - ^',n[^0:n]|| T V < ~ C ' P * ~ °- S - 



(15) 



n^oo 



■n] II tv) 



3 



< (3 n [1 + v{V)v'{V)\ + r (is, n) + r (i/, n) + ]T n(n) (16) 



i=i 



where the sequences {r (u, n)}„>o and {rj(n)} n >o, i — 1,2,3 are defined by 




(17) 




(18) 



(20) 



(19) 
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3 Applications to HMM 



We will now discuss conditions upon which (13) and (14) hold (Propositions 4 to 6) and 
upon which the right hand side in (16) vanishes (Proposition 7 to Corollary 11). To that 
goal, we assume that {Yk}k>o is the observation of an HMM {Xk,Yk}k>o with Markov 
kernel T* = ® G>, where is a transition kernel on (X, X) and G± is a Markov kernel 
from (X, X) to (Y, y), and initial distribution v+ on (X, X). 

Recall that a kernel P on a general state space (Z, Z) is phi-irreducible and (strongly) 
aperiodic if there exists a cr-finite measure ip on (Z, iT), such that, for any A e Z satisfying 
(f(A) > and any initial condition x, P n (x; A) > 0, for all n sufficiently large. A set C G Z 
is called petite for the Markov kernel P if for some probability measure m on N, with finite 
mean sampling time (which can always be done without loss of generality (20, Proposition 
5.5.6)) 

oo 

m(n)P n (x, A) > e c \ c (A) , for all x E C, A e Z, 

n=0 

where Ac is a measure on (Z, Z) satisfying Ac(C) > and > 0. We denote by Pf and 
the probability distribution and the expectation on the canonical probability space 
(Z N , Z m ) associated to the Markov chain with transition kernel P and initial distribution 
v. 

We first state sufficient conditions for T* to be an aperiodic positive Harris chain (see 
definitions and main properties in (20, Chapters 10 & 13) and (5, Chapter 14)) and for 
the law of large numbers to hold for the Markov chain with kernel T*. 

Proposition 4 Assume that Q± is an aperiodic, positive Harris Markov kernel with sta- 
tionary distribution 7r + . Then, the kernel T± defined by 

n[(x,y),A] = J J Q4x,dx')G*(x\dy')l A (x',y') , A G X ® y , 
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is an aperiodic positive Harris Markov kernel with stationary distribution <g> G±. In 
addition, for any initial distribution v+ on (X,X), and any function if e B + (X x Y) 
satisfying n+ ® G±(ip) < oo, 

n 

n-^ipiX^^n^GM F^-a.s. (21) 

Corollary 5 // it* (g) G+ (log T x ) + < oo (Vesp. 7r + (g> (log ^ D )_ < oo,), £/ien ; condition 
(13) (resp. (14) j is satisfied with P* = f P^* 0G ^. 

In many problems of interest, it is not straightforward to establish that the chain is positive 
Harris; in addition, the distribution is not known explicitly making the conditions of 
Corollary 5 difficult to check. It is often interesting to apply the following result which 
is a direct consequence of the /-norm ergodic theorem and the law of large numbers for 
positive Harris chain (see for example (20, Theorems 14.0.1, 17.0.1)). 

Proposition 6 Let f* > 1 be a function onX. Assume thatQ* is a phi-irreducible Markov 
kernel and that there exist a petite set C±, a function 14 : X — > [1, oo), and a constant b* 
satisfying 

QMx) < V*(x) - h{x) + M c *(a;) • (22) 

Then, the kernel Q± is positive Harris with invariant probability n± and 7r*(/*) < +oo. Let 
ip G B + (X x Y) be a function such that 

sup f~ 1 (x)G* (x, <p(x, •)) < oo, (23) 
xex 

Then, n+ ® G*(<p) < oo. 

We now derive conditions to compute a bound for {r (z/, n)} n > . 

Proposition 7 Assume (Hl-2) and that the drift function V defined in (H2) satisfies 
sup x V _1 QV < oo. 
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(i) If for some p > 1, 

sup sup \/- 1 E,[log( ? (-,r i )^ < oo , (24) 

i=0,l X 

then, there exists a constant C such that, for any initial probability measure v on 
(X, A") such that vQto > and all n > 0, r (u,n) < Cn~ p v{V). 

(ii) If for some positive \, 

sup sup V' 1 ^ (exp(A[log^(-, Y$\-)) < oo , (25) 

i=0,l X 

then there exist positive constants C,5 > 0, such that for any initial probability 
measure v on (X, X) such that uQt D > 0, and all n>0, r (u,n) < Ce~ Sn u(V) . 

To determine the rate of convergence of the sequences {rj(n)} n > to zero, % = 1, 2, 3, it is 
required to use deviation inequalities for partial sums of the observations {Yk}k>o- There 
are a variety of techniques to prove such results, depending on the type of assumptions 
which are available. If polynomial rates are enough, then one can apply the standard 
Markov inequality together with the Marcinkiewicz-Siegmund inequality; see for example 
(7) or (12). 

Proposition 8 Assume that 

(i) Q± is aperiodic and positive Harris Markov kernel with stationary distribution n+. 
(ii) There exist a petite set C* and functions U+, K, W* : X — > [1, oo) and a constant b* 
satisfying 7r*(W*) < oo and 

Q*v* <v*-u ic + Mc* > 

Q+W+ <W i ,-V i , + 6,1c. 

Let p > 1 . There exists a constant C < oo such that for any function if on (Y, y) 
satisfying sup x o r ~ 1 G,(-, \(p\ p ) < oo and supxU^V* G*(-, \ip\) < oo, and for any 
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initial probability distribution v± on (X,X), and any S > 0, 



J2W(Yi)-n*®GM}>5n 



i=l 



< C5- p n-W 2v Vis*(W+) , 



Corollary 9 // there exists p > 1 such that 



and 



sup/" 1 ^-, | logTxf) < oo , supZ-V;- 1 ^^-, | logTxl) < oo , 

X X 



sup/" 1 ^-, I log tf D | p ) < oo , sup /-V, 1 - 1 /^^-, I log tf D |) < oo , 

X X 



then there exist finite constants C, Mi, i = 1, 2, 3 such that 

n{n) < Cn-^ 2yl ^*{W+) . 

If we wish to establish that the sequences {rj(n)}„>o decreases to zero exponentially fast, 
we might for example use the multiplicative ergodic theorem (17, Theorem 1.2) to bound 
an exponential moment of the partial sum, and then use the Markov inequality. This will 
require to check the multiplicative analog of the additive drift condition (22). 

Some additional definitions are needed. Let W : X — > (0, oo) be a function. We say that 
the function W is unbounded if sup x W = +oo. We define by Qw the set of functions 
whose growth at infinity is lower than W, i.e. F belongs to Qw if and only if 

sup(|F| - W) < oo . (26) 
x 

Proposition 10 Let W+ be an unbounded function W+ : X — > (0, oo) and that the level 
sets {W± < r} are petite. Assume that Q± is phi-irreducible and that there exist a function 
14 : X — > [1, oo), and constant 6* < oo such that 

log < -W4 + K ■ (27) 

Then, Q+ is positive Harris with a unique invariant probability distribution tt+, satisfying 
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7T*(K) < oo. Let <f be a non-negative function. If for some A* > ; 



log [G* (-,e A ^)] e Gw* , 



(28) 



there exists a constant M > such that, for any initial distribution v± satisfying (T4) < 



oo, 



(29) 



limsupn-MogP^ X>(^) > Mn < . 

n->°o \^ =Q J 

Corollary 11 Assume that for some A* > ; 

log [G* ^^xI-h)] e ^ lQg ^ ^ e A.[log* D ]_^ e 

Then, there exist constants Mi, i — 1,2 sncn i/ioi limsup^^ n -1 logr;(n) < 0, where 
{rj(n)} n > are defined in (18) and (19). 

4 Examples 

In this section, we illustrate our results using different models of interest. 
4-1 The dynamic tobit model 



The tobit model is simply the time series extension of the standard univariate tobit model 
and so the univariate hidden process is only observed when it is positive ((19) and (1)): 



Xk+i — 4>Xk + cr(k , 
Y k = max(X fc + j3e k , 0) , 



(30) 



where {(Cfc, £k)}k>o is a sequence of i.i.d. standard Gaussian vectors, and \(j>\ < 1, a > 
and P > 0. Here X = R, Y = R + and X and y are the corresponding Borel cr-algebra. The 
model is partially dominated (see (2)) with respect to the dominating measure <5 + A Leb , 
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where A Leb is the Lebesgue measure and 5 is the Dirac mass at zero. The transition 
kernels Q^ a and the likelihood gp are respectively given by: 



Q... ,,{.r. A) = ( 2 ;•;<•>-) / exp 



(l/2a 2 )(x' - (f)x) 2 \ l A (x')X Lch (dx') , (31) 

POO r -, 

g fi (x,y) = l{y = 0}(27r/? 2 )- 1/2 / exp \-{l/2(3 2 )v 2 ] X Lch (dv) 

+ l{y>0}(2nP 2 )- 1 / 2 e W [-(l/2[3 2 )(y-x) 2 ] . (32) 

We denote Q = and g = g@. 

We assume that {Y k } k > are the observations of a tobit model (30) with initial distribution 
v+ and 'parameters' 0*, a+, (3* (which may be different from 0, a, j3) satisfying |0*| < 1, 
<r* > and & > 0. We denote by = Q^, •) = ^(x, -)^ Lcb and = E^ G+ , 

where T+ = G*. 



4-1.1 Assumptions HI and H2 

It is easily seen that any bounded Borel set C C {x, < |x| < C} satisfies the local Doeblin 
property (7), with A c (-) = (2C) _1 A Leb (l c -)- Assumption (HI) is trivially satisfied. To 
check (H2), we set K = Y and V(x) = e c ' x ' for some c > 0. The function V~ l QV is locally 
bounded and lim^i^oo V~ 1 (x)QV(x) = 0. Therefore, since sup XxY g(x, y) < lV(27r/? 2 )^ 1 / 2 , 
for any rj > one may choose a constant C > large enough so that Tc^(y) < r]Tx(y), 
where C = f {0 < |x| < C} and is defined in (8). Therefore, (H2) is satisfied. 

4-1.2 Application of Theorem 1 

We now check conditions (12) to (14) of Theorem 1. Conditions (12) and (13) are obvious 
since K = Y and sup Y T x < oo. We now check (14) with D = {0 < \x\ < D} and Ad(-) = 
(2D) _1 A Lcb (1 D -) where the constant D is an arbitrary positive constant. Q*(x,dy) is a 
Gaussian density with mean (fi+x and standard deviation a*. Using standard arguments, 
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Q* is aperiodic, positive Harris with invariant distribution 7r* which is a centered gaussian 
distribution with variance <7 2 /(l — 2 ), and any compact set is petite. By the Jensen 
inequality, log \ D (g(; y)l D ) = log A D (#(-, y)) > A D (logg-(-, y)), which implies 

log^ D (y) = logA D ((?(.,y)) > l{y = 0} log j^ 2 )' 1 / 2 jT e^/^A^^)} 

+ l{y>0}{-(l/2)log(27r/? 2 )-(12D/? 2 )- 1 (( J D + y) 3 + ( J D-y) 3 )} , (33) 

so that n+ ® G^Qlog ^ D ]_) < oo. Corollary 5 implies (14). Combining the results above, 
Theorem 1 therefore applies showing that (15) holds for any probability v and v' such 
that / is(dx)e c \ x \ + / u'(dx)e c ^ < oo for some c > 0. 

4-1.3 Application of Theorem 3 

We now consider the convergence of the expectation of the total variation distance at a 
polynomial rate. For all p > 1, there exists a constant C such that, for any i G {0, 1}, 
E*[>f p ] < C(l +E*[X 2p ]) which is finite since {X;} is Gaussian. Therefore, 

sup(l + M 2 r p E + [log^x,^)]- < oo , (34) 
x 

which implies (24) since V(x) = exp(c|x|). By Proposition 7, there exists a constant C 
such that for any probability measure v such that v{V) < oo, r (v, n) < Cn~ p v{V). 
Since sup Y T x < oo, we may choose M 1 > such that Mi > sup Y logT x ; for this choice, 
ri(n) = 0, where {ri(n)} n > is defined in (18). Since K = Y, r 3 (n) = 0, where {r 3 (n)} n > 
is defined in (20). We now consider {r 2 (n)} n > and apply Proposition 8. To that goal, we 
further assume that there exists p± > 1 such that z/*(|a:| 3p * +1 ) < oo. It is easily seen that 
the drift condition (22) is satisfied with V*{x) = 1 + |a:| 3p * and /* ~ |x| 3p * _1 ; furthermore, 
upon noting that [log ^ D (y)]^ ~ \y\ 2 , we have 

sup/,rU(., [log* D (y)] P -) < oo, supZ-V* 1 - 1 /^-, pog* D (y)]-) < oo , 

X X 
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thus proving limsup n _ >00 n - ( p */ 2vl V 2 (n) = 0. Therefore, by Theorem 3, the expectation 
(j|0i,, n [lo :n ] — i /,np / O:n]|| TV ) goes to zero at the rate n p */ 2vl for any initial distributions 
v, v' such that J{v(dx) + i/(dx)} exp(c|x|) < +00. 

The exponential decay can be proved similarly under the assumption that for some c > 0, 
/ Vi<(dx) exp(c|x|) < +00; details are omitted. 

4-2 Non-linear State-Space models 

We consider the model (6) borrowed from (16). Assume that f3 > 0, 
NLG(6, h) The functions b and h are locally bounded and 

lim (\x + b(x)\ - \x\) = -00 . (35) 

|rr|— >oo 

NLG(cr) The noise variance is non-degenerated, 

< inf inf X T a(x)a T (x)\ < sup sup \ T a(x)a T (x)\ < 00 . (36) 

xeRd {AeM d ,|A|=l} x6M d {AgRd,|A|=l} 

The model is partially dominated with respect to the Lebesgue measure. The transition 
kernel Q^ a and the likelihood g^p are respectively given by 

Q b , a (x,A) = (27r)- d />(x)|- 1 J e W (-(l/2)\x'-x-b(x)\l {x) ) l A (x')\ Lch (dx>) , (37) 
g h Ax,y) = (27r/3 2 )^ 2 exp(-|y - h(x)\ 2 /2(3 2 ) , (38) 

where |u|^( x ) = u r [<j{x)a T {x)\~ 1 u. As above, we set Q = Qb, a and g = g^p- 

Assume that {Yk}k>o are the observations of a non-linear Gaussian state space (6) with 
initial distribution v+ and 'parameters' b+, h±, a* and (3*. We assume that (3* > and that 
the functions &*, h* and a* satisfy NLG(6*, /i*)-NLG(<7*), respectively, and 

lim sup log |/i*(x)| < 00 . (39) 
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We denote by Q* = Q 6 ^, = ^ A A Leb and = where T± = Q±(g) G±. 



4-2.1 Assumptions HI and H2 

Under NLG(6, /i)-NLG(cr), every bounded Borel set in R d is locally Doeblin in the sense 
given by (7). (HI) is trivial. Set V(x) = exp(c|x|), where c is a positive constant. The 
likelihood g is bounded by (2nf3 2 )~ e / 2 and under NLG(a), there exists a constant M < 
oo such that V~ 1 (x)QV(x) < Mexp [c(\x + b(x)\ — \x\)]. Therefore, under NLG(b,h)- 
NLG(cr), for any rj > 0, we may choose a constant C large enough such that Tcc(y) < 
r)Tx(y) for any y G Y where C={x6 M. d , \x\ < C}. Hence, assumption (H2) is satisfied 
with K = Y. 



4-2.2 Application of Theorem 1 

Condition (12) is trivial since K = Y. Condition (13) is obvious too since T x is everywhere 
bounded. For (14), let us apply Corollary 5 and Proposition 6. is aperiodic, phi- 
irreducible and compact sets are petite. Set D = {x G M d , \x\ < D}, where D > and 
define A D (-) = A Lcb (l D -)/A Lcb (D). Noting that \y - h{x)\ 2 < 2{\y\ 2 + \h{x)\ 2 ), 



[logg(x,y)]_<(3- 2 \y\ 2 + (3- 2 \h(x)\ 2 + (e/2) \og(2n(3 2 ) 



(40) 



Since the function h is locally bounded, sup D \h\ 2 < oo and (40) implies that 



[log* D (2/)]_ < A D ([log^(-,y)]_) < (3- 2 \y\ 2 + (3' 2 snp\h\ 2 + (£/2) log(27r/5 



(41) 



We set V*(x) = e c *^ we may find a compact (and thus petite) set C* and constants A* G 
(0, 1) and s+ such that <5*K < A*V* + s*lc*, so that (22) is satisfied with /* = (1 — A*)14- 
Hence is positive Harris-recurrent and 7r*(V*) < +oo. Furthermore, Eq. (41) implies 
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that there exists a constant C < oo such that 

G*(x,[logttD]_) <C(l + |^(x)| 2 ) <C^l + K(x)su P y-%| 2 ) • (42) 

The RHS is finite, provided c* > 2 lim sup^^^ | | 1 log which we assume hereafter. 

Therefore, by Corollary 5 and Proposition 6, 1 applies: (15) holds for any initial probability 
measure such that / e c ^u(dx) + J e c ^v'(dx) < +oo for some c > 0. 



4-2.3 Application of Theorem 3 

We are willing to establish geometric rate of convergence and for that purpose we will use 
Proposition 7 and Proposition 10. We set W(x) = c{\x\ — \x + b(x)\} V 1 and W*(x) = 
c±{\x\ — \x + b+(x)\} V 1 and assume that 

\h\ 2 G Q w and \hA 2 G Q w ^ . (43) 



W* is unbounded and the level sets are petite for Q+. Furthermore, V*(x) = e°*^ where 
c± > satisfies the multiplicative drift condition (27). Let A < (3 2 (2 A /3~ 2 )/4. Since 
\/3~ 2 < P~ 2 /4, Eq. (40) implies that there exists a constant C < oo such that for any 
integer i, 



E. 



^X[log g(x,Yi) 



<CE* 



^2\f3~ 2 \hi, (-Xj)l 5 



,A/3- 2 |ft(x)| 2 



g2A/3 _2 |/i*(Xj)| 2 



< oo provided ^*(V*) < +oo 



Since A < (3 2 /2, Lemma 18 shows that supj E* 
which is henceforth assumed. Therefore, Proposition 7 applies, showing that there ex- 
ists 5 > such that for any probability measure v such that v(V) < oo, r (v, n) < 
Ce~ 5n u(V). As in Section 4.1, because T x is bounded, we may choose Mi large enough 
so that ri(n) = (see (18)); similarly, since K = Y, r 3 (n) = 0. Eq. (41) implies 
that, for any A* small enough, logG* e A *l log<I ' D ]-^ G Gw+- Proposition 10 shows that 
limsup^,^ n~ l log r 2 (n) < 0. Hence Theorem 3 applies: for any initial distribution v, v 1 
such that f{v(dx) + v'(dx)}exp(c\x\) < +oo, E* (||0i,,n[^o:n] - <Pu',n[^0:n] || TV ) g oes to zero 
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at a geometric rate. 

4-3 Stochastic Volatility Model 

As a final example, we consider the stochastic volatility (SV) model. In the canonical 
model in SV for discrete-time data (14; 15), the observations {Yfc}fc>o are the compounded 
returns and {X k } k > is the log-volatility, which is assumed to follow a stationary auto- 
regression of order 1, i.e. 

Xk+i = 4>x k + <j( k , 

(44) 

Y k = (3exp(X k /2)e k , 

where {(Cfc, £k)}k>o is a i.i.d. sequence of standard Gaussian vectors, |0| < 1, a > and 
(3 > 0. Here X = Y = R and X and y are the Borel sigma-fields. The model is partially 
dominated with respect to the Lebesgue measure. The transition kernel Q^^ and the 
likelihood gp are respectively given by 

Q^{x,A) = (27HX 2 )- 1 / 2 J exp(-l/(2<7 2 )0r' - <t>xfl A {x')\^ h {dx') , (45) 
gp(x, y) = (27T/3 2 )- 1 / 2 exp (-y 2 exp(-z)/2/3 2 - x/2) . (46) 

We denote Q = and g = gp. 

We assume that {Y k } k > are the observations of the stochastic volatility model (44) with 
initial distribution v+ and parameters < 1, a* > 0, and /3+ > 0. We denote as above 
Q* = Q^, G* = ^A Lcb , % = Q* <g> G± and E* = E^. 

4-3.1 Assumptions HI and H2 

As in example 4.1, every bounded Borel set is locally Doeblin in the sense of (7). As- 
sumption (HI) is satisfied but the likelihood is not uniformly bounded over X x Y; nev- 
ertheless it is easily seen that swp xeX g(x,y) < (27re) _1 / 2 |?/| _1 . We set K = K. and put 
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V(x) = e c \ x \ where c is positive; as in Example 4.1, QV(-)/V(-) is locally bounded and 
lim^i^oo QV(x)/V(x) = 0, showing that assumption (H2) is fulfilled. 

4-3.2 Application of Theorem 1 

The Markov kernel is positive recurrent, geometrically ergodic and its stationary dis- 
tribution 7T* is Gaussian with mean and variance af/(l — (f) 2 ). Note that there ex- 
ists a constant C < oo such that for all y G Y, [logT x (y)] + < C]log|y||, which im- 
plies that G+(x, [log Tx]+) < C + \x\/2 for some constant C < oo. This implies that 
7r* (g) G*([log T x ]+) < oo and Corollary 5 implies (13). Set D = {x, \x\ < D} where D > 
and let A D (-) = A Lcb (l D -)/A Lcb (D). By the Jensen inequality, 

log* D (2/) > A D (log 5 (-,y)) = -(1/2) log(27r/? 2 ) - y 2 sh(D)/[2(3 2 D] , 

showing that there exists a constant C < oo such that [log^D(y)]_ < C(l+y 2 )- Therefore, 
G*(x, [\og^ D ]_) < C(l + f3 2 e x ). The conditions of Corollary 5 are satisfied, showing 
that (14) holds. As a result, (15) holds for any initial distributions v and v' such that 
/ v{dx) exp(c|x|) + / v'(dx) exp(c|x|) < oo. 

The problem of computing the convergence rates can be addressed as in the other exam- 
ples. 

5 Proof of Theorems 1 and 3 

Before proving the main results, some additional definitions are needed. A function / 
defined on X d = X x X is said to be symmetric if for all (x, x') G X, f(x, x') = f(x', x). An 
unnormalised transition kernel P on (X, X), where X — X <S> X is said to be symmetric 
if for all (x,x') in X and any positive symmetric function /, P [(x,x'), f] = P [(x',x), /]. 
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For P a Markov kernel on (X, X), we denote by P the transition kernel on (X, X) defined, 
for any (x, x') G X and A, A' G X, by 

P[(x, x'),A x A'] = P(x, A)P(x', A') . (47) 



For any A G X, and v and 1/ two probability distributions on (X, X) the difference 
(f>u,n[Vo:n\(A) - <f> v >, n [y 0:n ] (A) may be expressed as 

<t>v,n [Vo-.n] (A) - (f) v ^ n [y :n] {A) (48) 

_ Eg [n? =0 £(x^)iu(x n )] Eg [nr=o^(^,^)iA(x TO )] 

E?[nr =0 <?pQ,yi)] Eg[n? =0 ^i,j/i)] 
_ e%,„ [n^offCXj, jg, i/,) jupg] - Eg,,,, [nr=o gcg» jg, yj 

e? [n?= <?PQ, yOl E 3 [ir?= <?PQ, Vi)\ 

where g(x,x',y) = g(x,y)g(x',y). The idea of writing the difference using a pair of inde- 
pendent processes has been apparently introduced in (3); this approach is central in the 
work of (16). We consider separately the numerator and the denominator of Eq. (48). For 
the numerator, the path of the independent processes is decomposed along the successive 
visits to C x C as done in (16). 

Proposition 12 Let C be a LD-set and v and v' be two probability distributions on (X, X). 
For any integer n and functions gi G B+(X), i — 0, . . . , n, such that Eg [11^=0 9i(^i)] < 00 
and Eg [U7=o 9i( x i)] < define 



A n (v,v',{ 9i }? =0 ) 



(49) 



= sup 



n^(x l; x;)u(x n ) 



.i=0 



Q 



J{UXi,X'Al A {X n ) 



,i=0 



where gi(x,x') = gi(x)gi(x'). Then, 



M"y.{ji}y<4 



i=0 



(50) 
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where Q is defined as in (47) and 



N c ,n = E lcxc(X l ,x;)l CxC (X m ,x; +1 ) , (51) 



n-1 



i=0 



Pc d = f l-( ec /4) 2 • (52) 



PROOF. Put x = (x,x'), (ji{x) = g i {x)g i {x') ) C = C x C, and \q = Ac <E> Ac- We stress 
that the kernels that will be defined along this proof may be unnormalized. Since C is a 
locally Doeblin set, we have for any measurable positive function / on (X, X), 

(e c ) 2 Ac(lc7) < Q(x, lj) < (e+) 2 Ac(lc/) , for all x e C . (53) 
Define the unnormalised kernel Q and Qi by 

Qo(xJ)^t C (x)(6c) 2 \cW) (54) 
Qx(x, /) d ^ Q(x, /) - l c (s)(e c ) 2 Ac(lc/) = /) - Qo(x, /) . (55) 

Eq. (53) implies that, for all x G C, < Q\(x, 1^/) < PcQ(x, tcf)- It then follows using 
straightforward algebra that, 

Qi(x, /) = lc{x)Qi{x, tj) + tc(x)Qi(x, IcJ) + lc*(x)Qi(x, /) (56) 
< p c l t (x)Q(x, t-J) + tc(x)Q(x, Ice/) + lcc(x)Q(x, /) 

<Q(*, Pc c(i)lc /). 

We write A n (u, i/, {&}f =0 ) = sup AeA > \A n (A)\ where 
A n (A) = v®v' (g Qgi ■ ■ ■ g n -iQg~n^Axx) 

-i/®v (goQgi ■ ■ ■ gn-iQgn^-Axx) ■ (57) 
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Note that A n (A) may be decomposed as A n (A) = Z)t .„_ie{o,i}" A(A,t 0:n -i) where 

A n (A, t 0:n _i) = f V ® l/ (goQt 9l ■ ■ ■9n-lQt n - 1 9nlAxx) 

- v 1 <g> v (g Qt 9i ■ ■ ■9n-lQt n - 1 9n^Axx) 

Note that, for any t . n _i e {0, l} n and any sets A,B<EX, 



v' ® V (9oQt 9l ■ ■ ■ Qn-lQt^gntAxB) 



V®v' (g Qt 9l ■ ■ ■ gn-lQtn-^n^BxA) ■ (58) 



First assume that there exists an index i > such that U — then, 

1/ <g> I/' (g Qt o gi ■ ••9n-lQt n -i9nlAxx) 

= V®V (g Qt o gi ■ ■ ■ Qti_i<Me) X ( e C )^C (icft+lOti+i • • • <7n-l0t„_i<?nlAxx) 

= v ®v [g Qt gi ■ ■ ■ Qu-ifjdc) x ( e c ) 2 ^c i}c9i+iQt i+1 ■ ■ • 9«-i0t„-i9nUxx) 
by (58). Thus, A n (A, fon-i) is equal to except if for all i, U = 1, and (58) finally implies 



A n (A) = v®v' g G Qigi ■ ■ ■ g n -iQig n {^Axx - Ixxa) 



Using (56), we have 



A n (v, v' , {gi}i =0 ) <v®v' (goQigi ■ -g n -iQig n ) 



Li=0 



where the last equality is straightforward to establish by induction on n. The proof is 
completed. 



Remark 13 If the whole state space X is a locally Doeblin, then one may take C = X in 
the previous expression. Since Nx >n = n, (48) and the previous proposition therefore imply 
the uniform ergodicity of the filtering distribution, for any initial distribution v and v' , 
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and any sequence y .. n , ||0„,„[y O :n] - ^',n[yo:n]|| TV < Px where p x = 1- (e x /e^) 2 ; see (2) 
and (10). 

We consider now the denominator of (48). A lower bound for the denominator has been 
computed in (4, Lemma 2.2). This is obtained by using a change of measure ideas. We 
use here a more straightforward argument. 



Proposition 14 For any LD-set C G X , n > 1 and any functions g i G B + (X), i = 
0,...,ra, 



E Q 



i=0 



> (ec) n - 1 u(g Qg 1 l c )llXc(g i lc) 



i=2 



PROOF. The proof follows immediately from 



E9 



.i=0 



> e9 



g (X )l[g l (X l )l c (X l ) 



i=i 



and the minorization condition (7). 



By combining Propositions 12 and 14, we can obtain an explicit bound for the total 
variation distance ||0„ )n [yo:„] - <t>v' Avo-.n] || TV - 

Lemma 15 Let j3 G (0, 1). Then, for any LD-sets C C X and DCX, any initial proba- 
bility measures v and v' , any function V : X — > [1, oo), 

||^,n[y0:n] - 4>V ,n[V0:n] || TV < Pc" 

nr=o T xfa) max Jc{0i „. ira}im=an n»gjT C c(^)ri^TTxfa) , 

( eD ) 2 ("- l )^,D(yo,yi)^,D(yo,i/i)nr=2^D(^) n >u{ h 

where a n = f [n(l — P)/2\, \I\ is the cardinal of the set X and the functions and 
are defined in (10) and (11), respectively. 
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PROOF. Eq. (50) implies that for any (5 E (0, 1), 



A n (z/,z>, W? =0 ) <Ei^ 



n^(^,^)pc C '"l{^c,n>/3n} 



Li=0 



+ E 



Q 



J\g{X hyi )pZ^l{N c , n <(3n} 

i=0 



. We now consider the 



The first term in the RHS is bounded by p? n E^„, [n™= 9~( x h V%) 
second term. For any set A E X, denote by M Ai „ the number of visits of {X k } k > to the 
set A before n. By Lemma 17, the condition Nc, n < /3n implies that Mq h < n(l + f3)/2 
and Mq c n > a n . Note that for any x E X and y E Y, 



g(x,y)QV(x) < [Aiy^^Biy^Vix) , 



(59) 



where we have set V(x) = V(x)V(x'), A(y) = sup 2g £ C g(x, y)V 1 (x)QV(x), and B(y) 
sup ie x^(a;, y)V~ l (x)QV {x). Consider the process 



V = V(X ), and K = <|n 



'n-1 



g(Xi,yi 



V(X n ),n>l, (60) 



where by convention we have set 0/0 = (to deal with cases where either A(y) = or 
B(y) = 0). The process {V n } n > is a jF-super-martingale, where T = {Fn} is the natural 
filtration of the process {X k } k >o, J~n == &(Xo, ■ ■ ■ ,X n ). Denote by r an the a n -th return 
time to the set C c . On the event {Mq c n > a n }, r an < n, using that A(y) < B(y) 



i=Q 



UlMViT^lBiyi)] 1 ^ II B( Vi ) < C(y 0:n ) 

i=0 i=T an +l 



where C(y 0:n ) = maxj c{0 ,...,n},|x|=a„ U ie i MVi) Ui?i B (yi)- Therefore, 



e: 



Q 



f[g(Xi,yi)l{N c , n <Pn} 



.i=0 



< E, 



< C(y :n)E% u , 



n 



g(Xi,yi 



i[A( W )]iccW[S( yi )]icW) 
C(y 0:n )E^,[K + i] • 



Hg(X t ,y t )l{M C c tn >a n } 

.i=0 



V(X n+1 ) 
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The super-martingale inequality therefore implies 



Y[g(X t ,y t )l{N c ,n<(3n} 



< C(y :nHV)iy'(V) , 



u=o 

and the proof follows from (48) and Proposition 14, using that A(y) < Tx(y)T C c(y) and 
B{y) = Tx(y), where Ta(|/) is defined in (8). 

Corollary 16 Assume (H2). Let D be a LD-set, and 7 and {3 be constants satisfying 
7 G (0,1) and (3 G (0,7). Then, for any i] G (0,1) there exists a LD-set C such that, 
for any sequence yo- n G Y n+1 satisfying J27=o 1k( u j) > (1 +7) n /2, any initial probability 
measures v and v' , and any n > 1, 

HuAVO-.n] ~ 4>V ,n[V0:n] || TV < Pc" 

( T -/3)n/2 n n y-2/ \ 
+ U 1U=0 1 X^»; 1S(VW(V) 

( eD ) 2 ^^, D (yo,yi)^,D(i/o,yi)nr= 2 ^D(^) 1 j 1 h 

where pc , and are defined in (52), (10) and (11), respectively. 
PROOF. [Proof of Theorem 1] The conditions (13) and (14) imply that 

n n 

limsup exp(-2Mn) JJ T 2 ^) < 1 and limsup exp(-2Mn) JJ ^ D 2 (Fi) < 1 . 

Condition (HI) and uQl D > implies that & u ,D(yo,yi) > for any (yo,Z/i) G Y 2 . We then 
choose 77 small enough so that 

lim ?7 (7 - /3)n/2 exp(4Mn)(e l ;)" 2(n " 1) = . 
The proof follows from Corollary 16. 

PROOF. [Proof of Theorem 3] Note that for any a G (0, 1) and any integer n, 

E*[ll&V»[*0:n] - 0,',n[lo:n]|| TV ] < «" + P*[||^,n[>0:n] - <^,n[*0:n] || TV > «1 ■ 
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Consider now the second term in the RHS of the previous equation. Denote f2 n the event 
n n <M |log^ D (F , Y-y) > -M n ,log^ )D (y ,^i) > -M n , 

n n n n 

5>gT x (yi) < M 1 n,X)log* D (y i ) > -M 2 n,^l K (y) >n(l + 7 )/2 . 

i=0 i=2 i=l J 

Clearly, P*(f^) < E;=i r i( n ) + r o(^,^) + ?"o(V ) ™) where {rj(n)}„> and {r (^, n)}„> are 
defined in Eqs. (17)-(20). On the event fl n , 



^D(y ,yi)^ D (yo,yi) n^ 2 (y) n^(y)<e 2 ^Lo^. 

One may choose r] > small enough and £> G (0, 1) so that, for any n, 

^( 7 -/3)n/2 e 2n^ =0 M^ e -)-2(n-l) < _ 

The proof then follows from Corollary 16. 



6 Proof of Propositions 7, 8, and 10 



PROOF. [Proof of Proposition 7] By the Jensen inequality with the function u \— > 
[log(u)]?., we obtain that for any p > 1, 

[iog^ iD (r ,y)-iog(z/gi D )] p 

< (vQId)- 1 JJ u(dx )Q(x ,dx 1 )l D (x 1 ) XJogtffc, Y)] P - , (61) 

j=0 

which implies by the Fubini theorem, 

E,{[io g ^, D (y ,y 1 ) -iog(vQi D )] p _} 

< 2P-\uQl D )- 1 JJ v(dx )Q(x ,dx 1 )Y,E ir [\ogg(x i ,Y i )] p _ ■ 



28 



Since sup x V 1 E^[log5f(-, Yi)] p _ < oo, and sup x V 1 QV < oo, 

i=0 

< z/(\/)|supsup\/- 1 E4log( ? (-,r i )] p }(l + sup\/- 1 g\/) . (62) 

[i=0,l x J x 

Similarly, for A > 0, using the Jensen inequality with u \— > exp [(A/2)[logu]_] and the 
Fubini Theorem, we have 

E + [exp((A/2)[log^ D (y ,^i) -log(i/Ql D )]_)] < {uQt D y l 

x || z/(^ )g(x ,^ 1 )Ey 2 [exp(A[log^(xo,r )]-)]Ey 2 [exp(A[log( 7 ( a ; 1 ,r 1 )]_)] , 

and the proof follows since sup x V^^QV 1 / 2 < oo. 



PROOF. [Proof of Proposition 8] Let tp be a non negative function on Y. Assume that 
sup x CI" 1 y? p ) < oo. Proposition 6 shows that ir+ y9 p )] < oo. Without loss of 
generality, we assume that 7T* [G*(-, </?)] = 0. For any p > 1, 



E' 



i=0 



i=0 



E G *(*^) 



i=0 



(63) 



Since conditionally to X 0:ri the random variables Y 0:n are independent, we may apply the 
Marcinkiewicz-Zygmund inequality (13, Inequality 2.6.18 p. 82), showing that there exists 
a constant c{p) depending only on p such that 



i=0 



If 1 < p < 2, 

a^)-G^ i)V )} 



w 7 * 



i=0 



P / n \P/ 2 

< c(p)E£^ (E \<p(Yd - G*(Xi,<P)\ 2 ) 



<«)ECk [i^)-a(x^)n 



i=0 
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If p > 2, the Minkowski inequality yields 



E' 



*>)} 

i=0 



P / n \P/ 2 

< c(p) ^Eg^ [k(^) - G*pq^)H 2/p J 

<2Mp)n p/2 - 1 EC:^J^(X l ,|^| P )] • 

i=0 



The /-norm ergodic theorem (20, Theorem 14.0.1) implies that there exists a constant 
C < oo, such that for any initial probability measure i/*, 

f; [G^Xt, \<p\ p )\ <(n + 1)tt* (G* (-, y? p )) + C^(K) • 

i=0 

Combining these discussions imply that there exists a finite constant C\ such that 



E 



J2W(Yi)-G4X tlV )} 



i=0 



We now consider the second term in (63). Following the same lines as in the proof of (12, 
Proposition 12) and applying the Burkholder's inequality for martingales (13, Theorem 
2.10), there exists a constant C 2 < oo such that 



i=0 



< C 2 n^ 2vl Vll (Wi) 



The result follows. 



PROOF. [Proof of Proposition 10] The first statement follows from standard results 
on phi-irreducible Markov chains satisfying the Foster-Lyapunov drift condition (20). By 
Lemma 18, for any x G X and F G Gw*, 

< V ^x)e^ n+1 ^ +sup ^ F - w ^ . (64) 
Since under the probability P^g* the random variables Y 0:n are conditionally independent 



Vfc=0 
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given X 0:n , and the conditional distribution of Y$ given X Q:n is G*(Xi, •), 



w 7 * 



.fc=0 



E<?* 



E«* 

t 7 * 



e t * n c 



A*<p(Yfc) 



re 

JIG, (x fc ,e A ^) 



.fc=0 



0:re 



< E^* 



/ n 

exp hrjlogG* (X k ,e x ^) 



\k=0 



By the Jensen inequality, F = f logG* (•,e A * ¥ ') is non negative and belongs to Qw*'-, we 
may thus apply (64) which yields 



E 



n« 

.fc=0 



<^(K)e 



(n+l)(&*+sup x (F-W*)) 



The proof then follows by applying the Markov inequality. 



A Technical Results 



We have collected in this section the proof of some of the technical results. 

Lemma 17 For any integer n > 1, and sequence x = f {xi}i> G {0, 1} N 7 denote by 
M n (x) = E?=o l{xi = 1} and A n (x) d ^ f E?=o Ifo = l,x m = 1}. T/ien, 

. « n + 1 iV„(x) 
M n (x) < _ + -^i . 

PROOF. Denote by r the shift operator on sequences defined, for any sequence x = f 
{xi}i>i, by [rx]fc = Xk+i- Let x = {xi}i>o be a sequence such that Xj = for j > n. By 
construction, N n (x) = M„(xANDrx). The proof then follows from the obvious identity: 

n > M n (xORrx) = M n (x) + M n (rx) - M„(x AND rx) 

> 2M n (x) - 1 - iV n (x) , 
where AND and OR is the componentwise incluse "AND" and "OR". 



31 



Lemma 18 Assume that there exist a function V : X — > [l,oo) 7 a function W : X 
(0, oo) and a constant b < oo such that 



\og(V~ l QV) < -W + b . 



(A.l) 



Let n be an integer and F k , k — 0, . . . , n — 1, be functions belonging to Qw, where Qw is 
defined in (26). Hence, for any xGX, 



(n-l 
E \Fk(X k ) 
k=0 



< V(x)e bn+ ^k=o snp x {lFkl - w) . (A.2) 



PROOF. The proof is adapted from (17, Theorem 2.1). Set for any integer n, 



M n d ^ f V(X n ) exp ( J2 {W(X k ) - b} 



fn-l 



(A.3) 



\k=0 



The multiplicative drift condition (A.l) implies that {M n } is a supermartingale. Hence, 
for any n G N and iGX, 



\/(X n ) exp (-6n + ]T ^mII < ^(x) 



The proof follows. 
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